This is an R Markdown document that details one workflow to fire an EC2 instance, load RStudio on that instance, and initiate a docker image that has all spatial dependencies needed for GIS work on AWS. This workflow is truly Earth Lab specific, but can be adapted for any lab group. Here there will be a mix between the AWS user interface and terminal command prompts. Let’s get started.
Well I guess, first thing is first. Log onto your account and click on the EC2 icon under Compute.
The first key step is to properly launch the correct Amazon Machine Image. Within the quick start tab there are many AMI’s that sit on various operating systems. Depending upon what you will be running any of those may be the appropriate choice. For this tutorial we will be using a publicly available docker image produce by the Earth Lab which sits on an Ubuntu OS. For anyone who has the Earth Lab credentials, this docker image will be automatically be visible. Select this to move on (highlighted in red).
This step may need some more thought in how you choose the right EC2 instance type, which is all dependent upon the processing you are hoping to accomplish on AWS. After you have decided how much processing power you need, verify that it is reasonably price. There is quite a bit of literature out there to help explain all of the different types of instances, and how you should choose given the task you want to preform.
I think, generally, the two main variable you are going to be interested in are nodes and memory. If you want to do parallelizing of you code, the number of nodes will be critical to efficiently run the program. If you have large data-sets you may want to opt for more memory. And, of course, there may be combinations of those tasks you wish to accomplish/completely different tasks given your desires. Make an informed choice is really the main point here.
This is a step that you can easily bypass. Leave all as default for now.
Leave all as default for now. 10GB should be fine for whatever you are doing, but depending upon your tasks and data-sets involved you may want to increase the storage. Here is more information on how to chose the type and amount of storage, if you want to increase the capacity.
This is probably the most critical part in this whole workflow. By properly defining the HTTP port, our spatial docker image can interact with the EC2 instance. This is done by creating an “Add Rule” > “Custom TCP I”. Leave the “SSH” type as default. Now you want to specify that the port for the “Custom TCP I” is 8787.
Upon your first EC2 instance launch, you will create a key pair file (.pem), which is effectively the passcode to launch EC2 instances remotely. Once this is generated, put it somewhere on you computer that will be out of the way and safe. I recommend something like MyDocs>AWS>file-name.pem. I would also suggest naming it something that you will recognize and be specific to you (name, initials, etc.). After this has been accomplished, every other time you create an instance all you will need to do is select the pem file from the drop down bar. Easy!
Congratulations, you have successfully launched your first EC2 instance on AWS. A screen will appear with the name of the instance and all pertinent information associated with that particular instance. In the screen shot below, make note of the public DNS associated with this particular EC2 instance, we will be needing it in the next steps
ec2-52-32-104-200.us-west-2.compute.amazonaws.com
Now, if you do not already have a Docker Hub account, head over there and create one. Now download Docker to your machine. Sign into the Docker Hub application on your machine. If you have a Mac then it will be found in the top panel (see below)
Install and load the Docker container: r-spatial-aws
docker pull earthlab/r-spatial-aws docker run -i -t earthlab/r-spatial-aws /bin/bash
Now we need to access our .pem file, ssh into our EC2 instance, and launch our Docker container containing all of the spatial libraries needed for analysis.
Now in terminal type:
chmod 600 /point/to/path/*.pem
Then:
ssh -i /point/to/path/*.pem unbuntu@ec2-52-32-104-200.us-west-2.compute.amazonaws.com
You will need to change the ec2-52-32-104-200.us-west-2.compute.amazonaws.com based on your particular instance.
Now let’s launch the Docker container on top of our EC2 instance by:
docker run -d -p 8787:8787 earthlab/r-spatial-aws
Now that we have successfully created our EC2 instance which now sits on our spatial Docker container - let’s kick it off!
Open your favorite web browser
Copy your Public DNS and append :8787 to it. It should look like:
ec2-52-32-104-200.us-west-2.compute.amazonaws.com:8787
You will now see a log in prompt to enter RStudio
username: rstudio password: rstudio
Voila!* You are now running version of RStudio wrapped in a Docker container all on AWS!